Squibs: From Annotator Agreement to Noise Models

نویسندگان

  • Beata Beigman Klebanov
  • Eyal Beigman
چکیده

This article discusses the transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence. Unless appropriately reinterpreted, agreement coefficients do not indicate the quality of the data set as a benchmarking resource: High overall agreement is neither sufficient nor necessary to distill some amount of highly reliable data from the annotated material. A mathematical framework is developed that allows estimation of the noise level of the agreed subset of annotated data, which helps promote cautious benchmarking.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' Therapy

Behavioral and mental health research and its clinical applications widely rely on quantifying human behavioral expressions. This often requires human-derived behavioral annotations, which tend to be noisy, especially when the psychological objects of interest are latent and subjective in nature. This paper focuses on exploiting multiple human annotations toward improving reliability of the ens...

متن کامل

Learning part-of-speech taggers with inter-annotator agreement loss

In natural language processing (NLP) annotation projects, we use inter-annotator agreement measures and annotation guidelines to ensure consistent annotations. However, annotation guidelines often make linguistically debatable and even somewhat arbitrary decisions, and interannotator agreement is often less than perfect. While annotation projects usually specify how to deal with linguistically ...

متن کامل

Supersense tagging with inter-annotator disagreement

Linguistic annotation underlies many successful approaches in Natural Language Processing (NLP), where the annotated corpora are used for training and evaluating supervised learners. The consistency of annotation limits the performance of supervised models, and thus a lot of effort is put into obtaining high-agreement annotated datasets. Recent research has shown that annotation disagreement is...

متن کامل

Squibs And Discussions - Evaluating Discourse And Dialogue Coding Schemes

Agreement statistics play an important role in the evaluation of coding schemes for discourse and dialogue. Unfortunately there is a lack of understanding regarding appropriate agreement measures and how their results should be interpreted. In this article we describe the role of agreement measures and argue that only chance-corrected measures that assume a common distribution of labels for all...

متن کامل

Community annotation experiment for ground truth generation for the i2b2 medication challenge

OBJECTIVE Within the context of the Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records, the authors (also referred to as 'the i2b2 medication challenge team' or 'the i2b2 team' for short) organized a community annotation experiment. DESIGN For this experiment, the authors released annotation guidelines and a small set of annotated discharge summaries. They aske...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009